Skip to content

AIDP OpenAI Demo example - demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP.#12

Open
PicoNVIDIA wants to merge 2 commits intoNVIDIA:mainfrom
PicoNVIDIA:add-aidp-openai-demo
Open

AIDP OpenAI Demo example - demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP.#12
PicoNVIDIA wants to merge 2 commits intoNVIDIA:mainfrom
PicoNVIDIA:add-aidp-openai-demo

Conversation

@PicoNVIDIA
Copy link
Contributor

@PicoNVIDIA PicoNVIDIA commented Jan 7, 2026

Description

AIDP OpenAI Demo example, demonstrating the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification, exposed via MCP.

  • OpenAI-compatible search_vector_store tool via MCP
  • NVIDIA NIMs for embeddings (nv-embedqa-e5-v5) and LLM reasoning (llama-3.1-70b-instruct)
  • Milvus vector database integration

Summary by CodeRabbit

  • New Features

    • Added complete AIDP Retrieval API demo project with OpenAI-compatible vector store search functionality
    • Includes REST API service and MCP server implementation for semantic search on support tickets
    • Added data loading utilities and example queries demonstrating search, filtering, and result ranking capabilities
  • Documentation

    • Added comprehensive README with setup instructions, architecture overview, and usage examples
  • Chores

    • Added project configuration and workflow setup files

✏️ Tip: You can customize this high-level summary in your review settings.

This example demonstrates the NVIDIA AI Data Platform (AIDP) Retrieval API
following the OpenAI Vector Store Search specification, exposed via MCP.

Features:
- OpenAI-compatible search_vector_store tool via MCP
- NVIDIA NIMs for embeddings and LLM reasoning
- Milvus vector database integration
- Synthetic data generation using Nemotron LLM
- Full ReAct agent workflow with NeMo Agent Toolkit

Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>
- rest_api.py: OpenAI-compatible REST endpoint
- examples.py: Comprehensive API usage examples
- Updated pyproject.toml with fastapi/uvicorn deps
- Updated README with complete file list

Signed-off-by: Patrick Moorhead <pmoorhead@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@PicoNVIDIA
Copy link
Contributor Author

@CodeRabbit review full

@coderabbitai
Copy link

coderabbitai bot commented Jan 7, 2026

@PicoNVIDIA: I'll perform a full review of all the changes in this PR.

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Jan 7, 2026

Walkthrough

This pull request introduces a complete example project demonstrating AIDP Retrieval API with OpenAI compatibility. It includes project configuration, a data loading script, FastAPI REST service, MCP server implementation, workflow configuration, and example usage patterns that integrate Milvus vector database with NVIDIA embedding APIs.

Changes

Cohort / File(s) Summary
Project Configuration & Documentation
examples/aidp_openai_demo/README.md, examples/aidp_openai_demo/pyproject.toml, examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py
Adds README with end-to-end setup and usage guidance; project metadata with dependencies (nvidia-nat, pymilvus, fastapi, uvicorn, requests); SPDX license headers.
Data Loading
examples/aidp_openai_demo/scripts/load_support_tickets.py
Adds script to seed Milvus with synthetic support tickets via NVIDIA LLM; includes embedding generation, collection creation, fallback ticket generation, and semantic search validation.
REST API Implementation
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py
Implements OpenAI-compatible FastAPI service with Pydantic models, Bearer token authentication, Milvus vector search, NVIDIA embeddings integration, filtering, and error handling; exposes /v1/vector_stores and /v1/vector_stores/{vector_store_id}/search endpoints.
MCP Server Implementation
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py
Implements FastMCP HTTP server exposing search_vector_store tool; lazy Milvus client initialization, NVIDIA embedding retrieval, OpenAI-compatible response formatting, and comprehensive error handling.
Configuration & Examples
examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml, examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py
Adds YAML workflow configuration for react agent with Llama 3.1 LLM; examples module demonstrates API usage via five example scenarios (basic search, filtering, category filtering, full response format, OpenAI alignment) with health checks and error handling.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant FastAPI as FastAPI REST Service
    participant Auth as Authentication
    participant Embedding as NVIDIA Embeddings API
    participant Milvus as Milvus Vector DB
    participant Format as Response Formatter

    Client->>FastAPI: POST /v1/vector_stores/{id}/search
    
    FastAPI->>Auth: verify_api_key(Bearer token)
    Auth-->>FastAPI: ✓ API Key Valid
    
    rect rgb(200, 220, 240)
        Note over FastAPI,Milvus: Vector Search Flow
        FastAPI->>Embedding: GET embedding for query text
        Embedding-->>FastAPI: embedding vector
        
        FastAPI->>Milvus: search(vector, limit, filters)
        Milvus-->>FastAPI: hits with distances & metadata
    end
    
    rect rgb(220, 240, 220)
        Note over FastAPI,Format: Response Formatting
        FastAPI->>Format: Apply score_threshold & filters
        Format->>Format: Convert distances to similarity scores
        Format-->>FastAPI: Filtered results list
    end
    
    FastAPI-->>Client: SearchResponse (OpenAI-compatible JSON)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and comprehensively describes the main change: adding an AIDP OpenAI Demo example that demonstrates the NVIDIA AI Data Platform Retrieval API following the OpenAI Vector Store Search specification and exposed via MCP. This aligns perfectly with the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In @examples/aidp_openai_demo/scripts/load_support_tickets.py:
- Around line 44-61: The generate_embeddings function calls requests.post
without a timeout which can hang; update the call to include a reasonable
timeout (e.g., timeout=10) when posting to NVIDIA_EMBED_URL with headers from
get_nvidia_headers and json=payload, and wrap the request in a try/except to
catch requests.Timeout (and requests.RequestException) so you can log or
re-raise a clear error; ensure the modified code still calls
response.raise_for_status() and processes result["data"] into embeddings as
before.
- Around line 97-106: The LLM API call using requests.post(NVIDIA_LLM_URL,
headers=headers, json=payload) lacks a timeout and can hang; update the call in
load_support_tickets.py to pass a timeout parameter (e.g., timeout=120) to
requests.post and keep response.raise_for_status() as-is so long-running
generation is allowed but bounded; ensure you modify the line that creates the
response (the requests.post invocation) and test that timeouts propagate
cleanly.

In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py:
- Around line 286-293: The current CORS setup uses app.add_middleware with
CORSMiddleware and allow_origins=["*"] together with allow_credentials=True
which permits credentialed requests from any origin; change this for production
by replacing the wildcard with a configurable list of trusted origins (e.g.,
read ALLOWED_ORIGINS from an environment variable), then pass that list to
app.add_middleware (keeping allow_credentials=True if needed) so only explicitly
allowed origins are permitted; update any related configuration or startup code
to parse the ALLOWED_ORIGINS env var (split on commas) and use that variable in
the CORSMiddleware call.

In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py:
- Around line 139-145: The Milvus search call (client.search) can hang
indefinitely; update the search invocation that uses vector_store_id,
query_embedding and max_num_results to pass a per-operation timeout (e.g.,
timeout=30.0) so the call fails fast on unresponsive Milvus; add the timeout
parameter to the client.search(...) argument list and choose an appropriate
value consistent with other timeouts in the codebase.
🧹 Nitpick comments (11)
examples/aidp_openai_demo/README.md (1)

81-103: Consider minor markdown formatting improvements.

The documentation is comprehensive and well-structured. For improved markdown compliance and accessibility:

  1. Add language specifiers to fenced code blocks (lines 81, 141, 230) - helps with syntax highlighting
  2. Wrap the bare URL at line 175 in angle brackets: <http://localhost:3000>
  3. Line 177: Consider using a proper heading (###) instead of bold text for "Alternative: Command Line"

These changes improve document accessibility and comply with markdown best practices.

📝 Suggested markdown improvements

Line 81 - Add language identifier:

-```
+```text
 ┌─────────────┐         REST          ┌─────────────────┐

Line 141 - Add language identifier:

-```
+```text
 Creating collection: support_tickets with explicit schema

Line 175 - Wrap bare URL:

-Navigate to: http://localhost:3000
+Navigate to: <http://localhost:3000>

Line 177 - Use proper heading:

-**Alternative: Command Line**
+### Alternative: Command Line

Line 230 - Add language identifier:

-```
+```http
 POST /v1/vector_stores/{vector_store_id}/search

Also applies to: 141-146, 175-175, 230-232

examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (3)

51-55: Consider binding restrictions for production deployments.

The server binds to all network interfaces (host="0.0.0.0"), making it accessible from any network the host is connected to. For a demo, this may be intentional for flexibility, but consider:

  • For production: bind to localhost or specific interface
  • Document the security implications in deployment guidance
  • Consider adding authentication/authorization if exposing externally

159-167: Improve filter parameter validation.

The filter logic requires ALL three parameters (filter_key, filter_type, filter_value) to be present. If a user provides only some of these parameters, they are silently ignored, which could be confusing.

Consider adding explicit validation:

🔍 Proposed improvement for filter validation
                # Apply filters
-               if filter_key and filter_type and filter_value:
+               # Validate filter parameters - all or none must be provided
+               filter_params_provided = [filter_key, filter_type, filter_value]
+               filter_params_count = sum(1 for p in filter_params_provided if p is not None)
+               
+               if filter_params_count > 0 and filter_params_count < 3:
+                   # Partial filter parameters provided - this is likely an error
+                   logger.warning(f"Partial filter parameters provided (count={filter_params_count}). All three (filter_key, filter_type, filter_value) are required for filtering.")
+               
+               if filter_params_count == 3:
                    attr_value = entity.get(filter_key, "")
                    if filter_type == "eq" and attr_value != filter_value:
                        continue
                    elif filter_type == "ne" and attr_value == filter_value:
                        continue
                    elif filter_type == "contains" and filter_value not in str(attr_value):
                        continue

200-204: Improve exception logging to include traceback.

The exception handler should use logging.exception() instead of logging.error() to automatically include the full traceback, which is valuable for debugging.

🔧 Proposed fix
    except Exception as e:
-       logger.error(f"Search error: {e}")
+       logger.exception(f"Search error: {e}")
        return json.dumps({
            "error": {"code": 500, "message": str(e)}
        })
examples/aidp_openai_demo/scripts/load_support_tickets.py (3)

1-4: Shebang placement issue.

The shebang (#!/usr/bin/env python3) should be at the very beginning of the file, before the SPDX license headers, to ensure the script is executable.

Suggested fix
+#!/usr/bin/env python3
 # SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
 # SPDX-License-Identifier: Apache-2.0
-
-#!/usr/bin/env python3

205-207: Remove redundant import.

DataType is already imported at line 18; the re-import here is unnecessary.

Proposed fix
 def create_collection(client: MilvusClient):
     """Create the support_tickets collection in Milvus with explicit schema."""
-    from pymilvus import DataType
-    

278-290: Add strict=True to zip() for safety.

Using zip(..., strict=True) would catch potential mismatches between tickets and embeddings list lengths, which could indicate a bug. Also, the loop variable i is unused.

Proposed fix
     import uuid
     data = []
-    for i, (ticket, embedding) in enumerate(zip(tickets, embeddings)):
+    for ticket, embedding in zip(tickets, embeddings, strict=True):
         data.append({
             "pk": str(uuid.uuid4()),  # String primary key
             "vector": embedding,
             "title": ticket["title"][:256],  # Truncate to fit schema
             "text": f"{ticket['title']}. {ticket['content']}"[:4096],  # Use 'text' field for retriever
             "category": ticket.get("category", "General")[:64],
             "severity": ticket.get("severity", "medium")[:32],
             "status": ticket.get("status", "open")[:32]
         })
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (2)

1-4: Shebang placement issue.

Same as in the other file—move the shebang to line 1 for proper script execution.


385-394: Improve error handling and logging.

The exception handling can be improved:

  1. Use logger.exception() to capture the full traceback
  2. Chain the exception with from e to preserve context for debugging
Proposed fix
     try:
         return retrieval_service.search(vector_store_id, request)
     except HTTPException:
         raise
     except Exception as e:
-        logger.error(f"Search error: {e}")
+        logger.exception("Search error")
         raise HTTPException(
             status_code=500,
-            detail=f"Internal system error: {str(e)}"
-        )
+            detail=f"Internal system error: {e!s}"
+        ) from e
examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (2)

1-4: Shebang placement issue.

Same fix needed—move shebang to line 1.


333-350: Simplify health check logic.

The exception handling can be cleaner. Since response.raise_for_status() isn't called, a non-200 status doesn't raise an exception, making the manual check necessary. Consider simplifying:

Proposed fix
     try:
         # Check if API is available
         response = requests.get(f"{AIDP_API_BASE}/", timeout=5)
-        if response.status_code != 200:
-            raise Exception("API not available")
-        
+        response.raise_for_status()
         print(f"✓ AIDP API Server running at {AIDP_API_BASE}")
         
-    except Exception as e:
+    except requests.exceptions.RequestException:
         print(f"""
 ✗ AIDP API Server not running at {AIDP_API_BASE}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7406837 and 022d366.

📒 Files selected for processing (8)
  • examples/aidp_openai_demo/README.md
  • examples/aidp_openai_demo/pyproject.toml
  • examples/aidp_openai_demo/scripts/load_support_tickets.py
  • examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py
  • examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml
  • examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py
  • examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py
  • examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py
🧰 Additional context used
🧬 Code graph analysis (3)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (1)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)
  • search (169-259)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (1)
  • search_vector_store (79-130)
examples/aidp_openai_demo/scripts/load_support_tickets.py (1)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (1)
  • search (169-259)
🪛 Gitleaks (8.30.0)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py

[high] 413-415: Discovered a potential authorization token provided in a curl command header, which could compromise the curl accessed resource.

(curl-auth-header)

🪛 markdownlint-cli2 (0.18.1)
examples/aidp_openai_demo/README.md

81-81: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


141-141: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


175-175: Bare URL used

(MD034, no-bare-urls)


177-177: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


230-230: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.14.10)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py

4-4: Shebang should be at the beginning of the file

(EXE005)


53-53: Possible binding to all interfaces

(S104)


73-73: Avoid specifying long messages outside the exception class

(TRY003)


200-200: Do not catch blind exception: Exception

(BLE001)


201-201: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py

4-4: Shebang should be at the beginning of the file

(EXE005)


149-149: Avoid specifying long messages outside the exception class

(TRY003)


271-271: Unused function argument: app

(ARG001)


345-345: Unused function argument: api_key

(ARG001)


373-373: Unused function argument: api_key

(ARG001)


389-389: Do not catch blind exception: Exception

(BLE001)


390-390: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


391-394: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


393-393: Use explicit conversion flag

Replace with conversion flag

(RUF010)


419-419: Possible binding to all interfaces

(S104)

examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py

4-4: Shebang should be at the beginning of the file

(EXE005)


337-337: Abstract raise to an inner function

(TRY301)


337-337: Create your own exception

(TRY002)


337-337: Avoid specifying long messages outside the exception class

(TRY003)


341-341: Do not catch blind exception: Exception

(BLE001)


341-341: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

examples/aidp_openai_demo/scripts/load_support_tickets.py

4-4: Shebang should be at the beginning of the file

(EXE005)


36-37: Avoid specifying long messages outside the exception class

(TRY003)


56-56: Probable use of requests call without timeout

(S113)


105-105: Probable use of requests call without timeout

(S113)


121-121: Consider moving this statement to an else block

(TRY300)


207-207: Redefinition of unused DataType from line 18

Remove definition: DataType

(F811)


280-280: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


280-280: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)


346-346: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (17)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/__init__.py (1)

1-3: LGTM!

Appropriate SPDX license headers for the package initialization file.

examples/aidp_openai_demo/src/nat_aidp_openai_demo/configs/workflow.yml (1)

18-43: LGTM!

The workflow configuration is well-structured:

  • MCP client correctly points to the server on port 8081 (matches server.py)
  • LLM configuration uses appropriate NVIDIA NIM model with conservative temperature (0) for consistent reasoning
  • Agent setup includes sensible retry logic (max_retries: 2, retry_parsing_errors: true)
examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py (2)

70-91: LGTM! Good timeout and error handling.

The embedding function includes:

  • Appropriate 30-second timeout for the external API call
  • Proper authentication header handling
  • Error propagation via raise_for_status() for HTTP errors
  • Network errors (connection issues, timeouts) will propagate to the caller's exception handler

207-221: LGTM!

The entry point provides clear server information and correctly starts the MCP server with the streamable-http transport. The banner helpfully displays the endpoint URL, tool name, and Milvus connection details.

examples/aidp_openai_demo/pyproject.toml (1)

11-18: No issues found with dependency versions.

All specified dependencies are available on PyPI and appropriately constrained. The nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0 constraint matches multiple actively maintained alpha, beta, and RC builds (the specific version 1.4.0a0 does not exist, but the constraint correctly captures all qualifying versions from 1.4.0a20251008 onward). The pymilvus~=2.6 constraint is satisfied by stable releases 2.6.0 through 2.6.6. For an example project, using alpha versions is appropriate and expected.

examples/aidp_openai_demo/scripts/load_support_tickets.py (3)

343-356: LGTM with note on exception handling.

The broad Exception catch on line 346 is intentional to gracefully fall back to static tickets when LLM generation fails for any reason. This is appropriate for a demo/seeding script.


129-202: LGTM!

Good fallback data with realistic support ticket examples covering various NVIDIA product categories.


238-266: LGTM!

Random vector generation with proper normalization for demo mode is appropriate.

examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py (6)

300-326: LGTM for demo purposes.

The authentication correctly enforces Bearer token format. The comment on lines 317-318 appropriately documents this is demo-only validation. The api_key parameters in endpoints are not unused—they trigger the verify_api_key dependency for authentication enforcement.


146-167: LGTM!

Good implementation with proper timeout (30s) on the external API call and appropriate error handling.


169-259: LGTM with optimization note.

The search implementation is correct. The post-filtering approach (lines 214-230) works but could be optimized by pushing filters to Milvus using filter parameter in the search call for better performance with large datasets. Current approach is fine for demo/small scale.


71-127: LGTM!

Pydantic models are well-structured with appropriate validation constraints and follow the OpenAI API specification.


401-419: LGTM!

Binding to 0.0.0.0 is appropriate for containerized deployment. The Gitleaks warning about the curl example is a false positive—it's documentation showing usage patterns, not an actual secret.


270-277: LGTM!

The app parameter is required by FastAPI's lifespan context manager protocol—the static analysis warning is a false positive.

examples/aidp_openai_demo/src/nat_aidp_openai_demo/examples.py (3)

79-130: LGTM!

Clean implementation with proper timeout handling and correct OpenAI-compatible request format.


133-311: LGTM!

Well-structured examples that clearly demonstrate the API usage patterns and OpenAI alignment.


60-62: LGTM!

Good use of environment variables with sensible defaults for demo purposes.

Comment on lines +44 to +61
def generate_embeddings(texts: List[str]) -> List[List[float]]:
"""Generate embeddings using NVIDIA NIM."""
headers = get_nvidia_headers()

payload = {
"input": texts,
"model": EMBEDDING_MODEL,
"input_type": "passage",
"encoding_format": "float",
"truncate": "END"
}

response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload)
response.raise_for_status()

result = response.json()
embeddings = [item["embedding"] for item in result["data"]]
return embeddings
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to external API call.

The requests.post() call lacks a timeout, which could cause the script to hang indefinitely if the NVIDIA API is unresponsive. This is flagged by static analysis (S113).

Proposed fix
-    response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload)
+    response = requests.post(NVIDIA_EMBED_URL, headers=headers, json=payload, timeout=60)
🧰 Tools
🪛 Ruff (0.14.10)

56-56: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/scripts/load_support_tickets.py around lines 44 -
61, The generate_embeddings function calls requests.post without a timeout which
can hang; update the call to include a reasonable timeout (e.g., timeout=10)
when posting to NVIDIA_EMBED_URL with headers from get_nvidia_headers and
json=payload, and wrap the request in a try/except to catch requests.Timeout
(and requests.RequestException) so you can log or re-raise a clear error; ensure
the modified code still calls response.raise_for_status() and processes
result["data"] into embeddings as before.

Comment on lines +97 to +106
payload = {
"model": LLM_MODEL,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.8,
"max_tokens": 8000
}

print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")
response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload)
response.raise_for_status()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add timeout to LLM API call.

Similar to the embeddings call, this requests.post() to the LLM endpoint should include a timeout to prevent indefinite hangs. Given that LLM generation may take longer, a higher timeout (e.g., 120s) is appropriate.

Proposed fix
     payload = {
         "model": LLM_MODEL,
         "messages": [{"role": "user", "content": prompt}],
         "temperature": 0.8,
         "max_tokens": 8000
     }
     
     print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")
-    response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload)
+    response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload, timeout=120)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
payload = {
"model": LLM_MODEL,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.8,
"max_tokens": 8000
}
print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")
response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload)
response.raise_for_status()
payload = {
"model": LLM_MODEL,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.8,
"max_tokens": 8000
}
print(f"Generating {num_tickets} synthetic support tickets using {LLM_MODEL}...")
response = requests.post(NVIDIA_LLM_URL, headers=headers, json=payload, timeout=120)
response.raise_for_status()
🧰 Tools
🪛 Ruff (0.14.10)

105-105: Probable use of requests call without timeout

(S113)

🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/scripts/load_support_tickets.py around lines 97 -
106, The LLM API call using requests.post(NVIDIA_LLM_URL, headers=headers,
json=payload) lacks a timeout and can hang; update the call in
load_support_tickets.py to pass a timeout parameter (e.g., timeout=120) to
requests.post and keep response.raise_for_status() as-is so long-running
generation is allowed but bounded; ensure you modify the line that creates the
response (the requests.post invocation) and test that timeouts propagate
cleanly.

Comment on lines +286 to +293
# Add CORS middleware
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Review CORS configuration for production use.

Using allow_origins=["*"] with allow_credentials=True allows any origin to make credentialed requests. While acceptable for local development/demo, this should be restricted in production to specific trusted origins.

Suggested approach for production
# For production, specify allowed origins explicitly
ALLOWED_ORIGINS = os.getenv("ALLOWED_ORIGINS", "http://localhost:3000").split(",")

app.add_middleware(
    CORSMiddleware,
    allow_origins=ALLOWED_ORIGINS,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)
🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/rest_api.py around lines
286 - 293, The current CORS setup uses app.add_middleware with CORSMiddleware
and allow_origins=["*"] together with allow_credentials=True which permits
credentialed requests from any origin; change this for production by replacing
the wildcard with a configurable list of trusted origins (e.g., read
ALLOWED_ORIGINS from an environment variable), then pass that list to
app.add_middleware (keeping allow_credentials=True if needed) so only explicitly
allowed origins are permitted; update any related configuration or startup code
to parse the ALLOWED_ORIGINS env var (split on commas) and use that variable in
the CORSMiddleware call.

Comment on lines +139 to +145
# Search Milvus
search_results = client.search(
collection_name=vector_store_id,
data=[query_embedding],
limit=max_num_results,
output_fields=["pk", "title", "text", "category", "severity", "status"]
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py | sed -n '130,155p'

Repository: NVIDIA/NeMo-Agent-Toolkit-Examples

Length of output: 1180


🌐 Web query:

pymilvus MilvusClient timeout configuration search operations

💡 Result:

Short answer: set timeout per-operation by passing timeout to search(), or set a client connection timeout in MilvusClient(...) (which is only for establishing the connection, not a default for all RPCs). See docs.

Examples

  • Per-search timeout:
    client.search(collection_name="mycol", data=vectors, limit=10, timeout=5.0). [1]

  • Client constructor timeout (connection timeout only):
    client = MilvusClient(uri="http://localhost:19530", timeout=10) # used for establishing connection. [2][3]

Notes

  • Most MilvusClient methods (search, query, load_collection, list_collections, etc.) accept a timeout: Optional[float] parameter that controls that RPC's deadline (None = wait until response/error). [1][3]

Sources
[1] pymilvus search() API reference. (milvus.io)
[2] Connect/Configure a timeout (client connection) in Milvus docs. (blog.milvus.io)
[3] MilvusClient(...) constructor / timeout parameter. (milvus.io)

Citations:


🏁 Script executed:

# Search for timeout patterns in the codebase
rg -i "timeout" examples/aidp_openai_demo/ -B 2 -A 2

Repository: NVIDIA/NeMo-Agent-Toolkit-Examples

Length of output: 1818


🏁 Script executed:

# Check imports and client initialization
cat -n examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py | sed -n '1,50p'

Repository: NVIDIA/NeMo-Agent-Toolkit-Examples

Length of output: 1752


Add timeout parameter to Milvus search operation to prevent indefinite hangs.

The client.search() call at lines 140-145 lacks a timeout configuration. If the Milvus server becomes unresponsive, the operation will block indefinitely. The pymilvus library supports per-operation timeouts via the timeout parameter.

Consider adding a timeout to the search operation, for example:

search_results = client.search(
    collection_name=vector_store_id,
    data=[query_embedding],
    limit=max_num_results,
    output_fields=["pk", "title", "text", "category", "severity", "status"],
    timeout=30.0
)

This aligns with the timeout patterns already used elsewhere in the codebase (e.g., 30s for API calls). Adjust the timeout value based on expected Milvus response time for your dataset and hardware.

🤖 Prompt for AI Agents
In @examples/aidp_openai_demo/src/nat_aidp_openai_demo/server.py around lines
139 - 145, The Milvus search call (client.search) can hang indefinitely; update
the search invocation that uses vector_store_id, query_embedding and
max_num_results to pass a per-operation timeout (e.g., timeout=30.0) so the call
fails fast on unresponsive Milvus; add the timeout parameter to the
client.search(...) argument list and choose an appropriate value consistent with
other timeouts in the codebase.

@PicoNVIDIA
Copy link
Contributor Author

@AnuradhaKaruppiah, please review when you have time.

Thanks for the help.

@AnuradhaKaruppiah
Copy link
Contributor

/ok to test 022d366

@willkill07 willkill07 added feature request New feature or request non-breaking Non-breaking change labels Jan 27, 2026
@willkill07 willkill07 self-assigned this Jan 27, 2026
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this live under src/ if it's intended to be run by the user?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not expose these tools through NAT? nat mcp serve --config_file <>

This is non-blocking, but it would be nice to see tighter integration.

I also recommend opening an Issue+PR on NeMo Agent Toolkit that dynamically adds vector store endpoints to the FastAPI front-end. /v1/vector_stores/{vector_store_id}/search

We can add an optional list of RetrieverRefs to the FastAPI configuration :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some partners do not want to use NAT, only the server spec, I used NAT as a reference and recommended it if they didn't already have one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, but this is also a NeMo Agent Toolkit Examples repository.

requires-python = ">=3.11"

dependencies = [
"nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use semantic versioning as best as possible. So what you write in now should work until 2.0.0

Suggested change
"nvidia-nat[langchain,mcp]>=1.4.0a0,<1.5.0",
"nvidia-nat[langchain,mcp]>=1.4.0a0,<2.0.0",

Comment on lines +14 to +17
"fastmcp",
"fastapi",
"uvicorn",
"requests",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving all of these unversioned seems odd. Maybe add a comment that says # versions determined by NeMo Agent Toolkit

version = "0.1.0"
description = "AIDP Retrieval API Demo with NVIDIA NIMs - OpenAI Vector Store Search specification via MCP"
readme = "README.md"
requires-python = ">=3.11"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
requires-python = ">=3.11"
requires-python = ">=3.11,<3.14"


```bash
export NVIDIA_API_KEY=<YOUR_API_KEY>
nat serve --config_file src/nat_aidp_openai_demo/configs/workflow.yml --port 8000
Copy link
Member

@willkill07 willkill07 Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is normal for examples to have a symbolic link from configs -> src/<package>/configs

That way you can have:

--config_file configs/workflow.yml

@willkill07
Copy link
Member

@PicoNVIDIA could you please address the review feedback?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants